-
Notifications
You must be signed in to change notification settings - Fork 244
Benchmark GPUArrays AK reduction implementation #2815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2815 +/- ##
===========================================
- Coverage 89.62% 75.06% -14.57%
===========================================
Files 153 153
Lines 13276 13213 -63
===========================================
- Hits 11899 9918 -1981
- Misses 1377 3295 +1918 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/perf/runbenchmarks.jl b/perf/runbenchmarks.jl
index ad2564a7d..0b84d307d 100644
--- a/perf/runbenchmarks.jl
+++ b/perf/runbenchmarks.jl
@@ -1,6 +1,6 @@
# benchmark suite execution and codespeed submission
using Pkg
-Pkg.add(url="https://github.com/christiangnrd/GPUArrays.jl", rev="akreduce")
+Pkg.add(url = "https://github.com/christiangnrd/GPUArrays.jl", rev = "akreduce")
using CUDA
|
[only benchmarks]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
Benchmark suite | Current: 8fd59db | Previous: e561e7a | Ratio |
---|---|---|---|
latency/precompile |
43267306340.5 ns |
43393378645 ns |
1.00 |
latency/ttfp |
6993590450 ns |
7099882121 ns |
0.99 |
latency/import |
3573563594 ns |
3463869374 ns |
1.03 |
integration/volumerhs |
9628360.5 ns |
9623663 ns |
1.00 |
integration/byval/slices=1 |
146972 ns |
146714 ns |
1.00 |
integration/byval/slices=3 |
425877 ns |
425787 ns |
1.00 |
integration/byval/reference |
145006 ns |
144967 ns |
1.00 |
integration/byval/slices=2 |
286766 ns |
286209 ns |
1.00 |
integration/cudadevrt |
103536 ns |
103426 ns |
1.00 |
kernel/indexing |
14132 ns |
14196 ns |
1.00 |
kernel/indexing_checked |
14886 ns |
14906 ns |
1.00 |
kernel/occupancy |
693.9054054054054 ns |
759.2189781021898 ns |
0.91 |
kernel/launch |
2168.8888888888887 ns |
2287.222222222222 ns |
0.95 |
kernel/rand |
14649 ns |
15792 ns |
0.93 |
array/reverse/1d |
19757.5 ns |
19624 ns |
1.01 |
array/reverse/2d |
24702 ns |
24928.5 ns |
0.99 |
array/reverse/1d_inplace |
10355 ns |
10448 ns |
0.99 |
array/reverse/2d_inplace |
11869.5 ns |
12006 ns |
0.99 |
array/copy |
20679 ns |
20990 ns |
0.99 |
array/iteration/findall/int |
157631 ns |
159128.5 ns |
0.99 |
array/iteration/findall/bool |
139481 ns |
139832 ns |
1.00 |
array/iteration/findfirst/int |
136892 ns |
162546 ns |
0.84 |
array/iteration/findfirst/bool |
130429 ns |
164393.5 ns |
0.79 |
array/iteration/scalar |
72427 ns |
72740 ns |
1.00 |
array/iteration/logical |
206893 ns |
216803.5 ns |
0.95 |
array/iteration/findmin/1d |
117265.5 ns |
45968 ns |
2.55 |
array/iteration/findmin/2d |
258985.5 ns |
96433 ns |
2.69 |
array/reductions/reduce/Int64/1d |
46056.5 ns |
44555 ns |
1.03 |
array/reductions/reduce/Int64/dims=1 |
87317 ns |
48607 ns |
1.80 |
array/reductions/reduce/Int64/dims=2 |
47322 ns |
63682.5 ns |
0.74 |
array/reductions/reduce/Int64/dims=1L |
167957.5 ns |
88842 ns |
1.89 |
array/reductions/reduce/Int64/dims=2L |
1158683 ns |
89417.5 ns |
12.96 |
array/reductions/reduce/Float32/1d |
43684 ns |
34490 ns |
1.27 |
array/reductions/reduce/Float32/dims=1 |
89077 ns |
50554 ns |
1.76 |
array/reductions/reduce/Float32/dims=2 |
43317 ns |
59726 ns |
0.73 |
array/reductions/reduce/Float32/dims=1L |
116708.5 ns |
52852 ns |
2.21 |
array/reductions/reduce/Float32/dims=2L |
1107589 ns |
70052.5 ns |
15.81 |
array/reductions/mapreduce/Int64/1d |
46570 ns |
45547 ns |
1.02 |
array/reductions/mapreduce/Int64/dims=1 |
87306 ns |
48423.5 ns |
1.80 |
array/reductions/mapreduce/Int64/dims=2 |
47437 ns |
61443 ns |
0.77 |
array/reductions/mapreduce/Int64/dims=1L |
167764 ns |
88888 ns |
1.89 |
array/reductions/mapreduce/Int64/dims=2L |
1161470 ns |
87908.5 ns |
13.21 |
array/reductions/mapreduce/Float32/1d |
44861 ns |
34245.5 ns |
1.31 |
array/reductions/mapreduce/Float32/dims=1 |
89153 ns |
47287 ns |
1.89 |
array/reductions/mapreduce/Float32/dims=2 |
43310 ns |
59743 ns |
0.72 |
array/reductions/mapreduce/Float32/dims=1L |
117063 ns |
53154 ns |
2.20 |
array/reductions/mapreduce/Float32/dims=2L |
1106921 ns |
70503 ns |
15.70 |
array/broadcast |
20460 ns |
20866 ns |
0.98 |
array/copyto!/gpu_to_gpu |
11132 ns |
12817 ns |
0.87 |
array/copyto!/cpu_to_gpu |
215248 ns |
213873 ns |
1.01 |
array/copyto!/gpu_to_cpu |
282942.5 ns |
284406 ns |
0.99 |
array/accumulate/Int64/1d |
124532 ns |
125170 ns |
0.99 |
array/accumulate/Int64/dims=1 |
83259 ns |
83519 ns |
1.00 |
array/accumulate/Int64/dims=2 |
157436 ns |
158002 ns |
1.00 |
array/accumulate/Int64/dims=1L |
1709001 ns |
1709945.5 ns |
1.00 |
array/accumulate/Int64/dims=2L |
965915 ns |
966571 ns |
1.00 |
array/accumulate/Float32/1d |
108726 ns |
109737 ns |
0.99 |
array/accumulate/Float32/dims=1 |
80262 ns |
80823.5 ns |
0.99 |
array/accumulate/Float32/dims=2 |
147251 ns |
147778 ns |
1.00 |
array/accumulate/Float32/dims=1L |
1618529 ns |
1619194 ns |
1.00 |
array/accumulate/Float32/dims=2L |
698100 ns |
698530 ns |
1.00 |
array/construct |
1267.6 ns |
1279.85 ns |
0.99 |
array/random/randn/Float32 |
42943 ns |
47253.5 ns |
0.91 |
array/random/randn!/Float32 |
24898 ns |
24573 ns |
1.01 |
array/random/rand!/Int64 |
27331 ns |
27294 ns |
1.00 |
array/random/rand!/Float32 |
8769.666666666666 ns |
8724.333333333334 ns |
1.01 |
array/random/rand/Int64 |
29760 ns |
29633 ns |
1.00 |
array/random/rand/Float32 |
12950 ns |
12902 ns |
1.00 |
array/permutedims/4d |
59908 ns |
61250.5 ns |
0.98 |
array/permutedims/2d |
53801 ns |
54865 ns |
0.98 |
array/permutedims/3d |
54901 ns |
55511 ns |
0.99 |
array/sorting/1d |
2756214.5 ns |
2757710 ns |
1.00 |
array/sorting/by |
3368533 ns |
3344132.5 ns |
1.01 |
array/sorting/2d |
1087613 ns |
1080389 ns |
1.01 |
cuda/synchronization/stream/auto |
1052.5454545454545 ns |
1015.8333333333334 ns |
1.04 |
cuda/synchronization/stream/nonblocking |
7580.2 ns |
7618.9 ns |
0.99 |
cuda/synchronization/stream/blocking |
812.78125 ns |
799.1530612244898 ns |
1.02 |
cuda/synchronization/context/auto |
1181.4 ns |
1164.1 ns |
1.01 |
cuda/synchronization/context/nonblocking |
8037.299999999999 ns |
7651.4 ns |
1.05 |
cuda/synchronization/context/blocking |
925.2666666666667 ns |
895.8490566037735 ns |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
Do not merge. Not a draft so the benchmarks run.